8 Introduction to Spatial Data in R

8.1 Set Up

You will return this assignment similar to last week, by working through this lesson in an R Markdown document, answering the Exercises at the end, and knitting as a Word document to submit to Canvas.

Therefore, begin this lesson by creating a a new R Markdown document, and make sure to select output as Word.

8.1.1 Package Installation

To carry out this lesson, you will need to install a couple new R packages to import and work with spatial data. The two main packages for working with spatial data are sf (for vector data) and terra (for spatial data). We will also be using tmap to visualize spatial data and make quick maps, along with the tigris package to import some vector data.

Run the following chunk of code in your console, comment it out, OR add eval = FALSE in the top of the code chunk. You do not want it to be included when you knit the R Markdown document, because it re-install the packages every time you knit.

install.packages("sf")
install.packages("terra")
install.packages("tmap")
install.packages("tigris")

Now we need to read in these packages at the beginning of our workflow. You should have this as an executable code chunk in your R Markdown document.

library(tidyverse)
library(sf)
library(terra)
library(tmap)
library(tigris)

8.1.2 Data download

Second, you will need to download an elevation raster file to carry out this lesson. If you haven’t already, in the R Project you have been using in this class, create a data/ folder. Then, click the download button below, and save the file (elevation.tif) in the data/ folder.

8.2 Spatial Data Formats

Vector Data

  • Locations (points)

    • Coordinates, address, country, city
  • Shapes (lines or polygons)

    • Political boundaries, roads, building footprints, water bodies

Raster Data

  • Images (matrix of cells organized by rows and columns)

    • Satellite imagery, climate, landcover, elevation

8.3 Import and manipulate spatial data

8.3.1 Vector Data

8.3.1.1 tigris

8.3.1.2 Polygons

All the data we are working with in this lesson is confined to the state of Colorado. Let’s start by pulling in political boundaries for Colorado counties with the tigris package, which returns a shapefile consisting of polygons for each county.

# download county shapefile for the state of Colorado
counties <- counties(state = "CO")

The tigris package is one of many data retrieval R packages that uses API calls to pull in data from various online/open databases directly into your R session, without the need to separately download. When you close out your R session, these ‘temp’ files are erased, so it does not use up any of your local storage. At the end of this lesson you will learn how to save shapefiles to your computer if you do in fact want to store and use them in the future (e.g., you manipulated a data set quite a bit and don’t want to re-run the entire process every new R session).

8.3.1.3 Lines

tigris has many other data sets in addition to political boundaries. Today let’s work with another shapefile, importing roads for Larimer county, which returns a polyline dataset for all roads in Larimer County.

roads <- roads(state = "CO", county = "Larimer")

8.3.1.4 tmap

Throughout this lesson we will be using the tmap package to produce quick static or interactive maps. It should be included in your setup script, but if not be sure to add it now.

tmap allows for both static (“plot” mode) and interactive (“view” mode) mapping options, which you can set using the function tmap_mode() . For today we will be making quick interactive plots. Once you set the mode with tmap_mode(), every plot call to tmap after that produces a plot in that mode.

tmap_mode("view")

Lets view our Colorado counties and Larimer County roads shapefiles. To make a “quick thematic map” in tmap you can use the qtm() function. You can also use tm_shape() plus the type of spatial layer (e.g., tm_polygons()) to add your layers to the map if you want to customize the map a little more. Notice how the two following chunks of code produce the same map, but qtm() is much more concise (but limited on customization abilities).

#Using qtm
qtm(counties)
#Using tm_shape
tm_shape(counties)+
  tm_polygons()+
tm_shape(roads)+
  tm_lines()

Rendering the map may take a little while due to relatively large size of the roads object.

Mess around with this map a little bit. See that you can change the basemap, turn layers on and off, and click on features to see their attributes.

Let’s inspect the spatial data sets a little more. What do you see when you run the following line of code:

class(counties)
## [1] "sf"         "data.frame"

8.3.2 sf

By default, the tigris package imports spatial data in sf format, which stands for ‘simple features’. The sf package provides an easy and efficient way to work with vector data, and represents spatial features as a data.frame or tibble with a geometry column, and therefore also works well with tidyverse packages to perform manipulations like you would a data frame.

For example, we are going to do an exercise for the Poudre Canyon Highway, so we want to filter out the roads data set to only those features. Using our investigative geography skills, we find the Poudre highway on the map and find out the ‘FULLNAME’ attribute is “Poudre Canyon Hwy”. We can then use that knowledge to filter() the data set to just that highway:

poudre_hwy <- roads %>% 
  filter(FULLNAME == "Poudre Canyon Hwy")

qtm(poudre_hwy)

8.3.2.1 Points

Most often when you are working with points, you start with an excel file or something similar that consists of the raw latitude and longitude. When you have spatial data that is not explicitly spatial yet or not in the sf format, you use the st_as_sf() function to transform it.

Lets work with a couple locations along the Poudre highway, making a small data frame of their coordinates:

poudre_points <- data.frame(name = c("Mishawaka", "Rustic", "Blue Lake Trailhead"),
                            long = c(-105.35634, -105.58159, -105.85563),
                            lat = c(40.68752, 40.69687, 40.57960))

Now convert it to an sf object, specifying the longitude and latitude columns and the CRS (Coordinate Reference System). Note that ‘x’ (longitude) always goes first followed by ‘y’ (latitude). We use the WGS84 CRS (EPSG code = 4326).

poudre_points_sf <- st_as_sf(poudre_points, coords = c("long", "lat"), crs = 4326)

qtm(poudre_hwy)+
  qtm(poudre_points_sf)

8.3.3 Coordinate Reference Systems

Probably the most important part of working with spatial data is the coordinate reference system (CRS) that is used. In order to analyze spatial data, all objects should be in the exact same CRS.

We can check a spatial object’s CRS by printing it to the console, which will return a bunch of metadata about the object. You can specifically return the CRS for sf objects with st_crs().

# see the CRS in the header metadata:
counties
## Simple feature collection with 64 features and 17 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -109.0602 ymin: 36.99245 xmax: -102.0415 ymax: 41.00344
## Geodetic CRS:  NAD83
## First 10 features:
##     STATEFP COUNTYFP COUNTYNS GEOID     NAME        NAMELSAD LSAD CLASSFP MTFCC CSAFP CBSAFP METDIVFP FUNCSTAT      ALAND
## 23       08      109 00198170 08109 Saguache Saguache County   06      H1 G4020  <NA>   <NA>     <NA>        A 8206547699
## 107      08      115 00198173 08115 Sedgwick Sedgwick County   06      H1 G4020  <NA>   <NA>     <NA>        A 1419419015
## 124      08      017 00198124 08017 Cheyenne Cheyenne County   06      H1 G4020  <NA>   <NA>     <NA>        A 4605713960
## 163      08      027 00198129 08027   Custer   Custer County   06      H1 G4020  <NA>   <NA>     <NA>        A 1913031975
## 200      08      067 00198148 08067 La Plata La Plata County   06      H1 G4020  <NA>  20420     <NA>        A 4376255276
## 228      08      111 00198171 08111 San Juan San Juan County   06      H1 G4020  <NA>   <NA>     <NA>        A 1003660672
## 374      08      097 00198164 08097   Pitkin   Pitkin County   06      H1 G4020   233  24060     <NA>        A 2514104905
## 397      08      093 00198162 08093     Park     Park County   06      H1 G4020   216  19740     <NA>        A 5681165604
## 408      08      003 00198117 08003  Alamosa  Alamosa County   06      H1 G4020  <NA>   <NA>     <NA>        A 1871643031
## 427      08      099 00198165 08099  Prowers  Prowers County   06      H1 G4020  <NA>   <NA>     <NA>        A 4243457240
##       AWATER    INTPTLAT     INTPTLON                       geometry
## 23   4454510 +38.0316514 -106.2346662 MULTIPOLYGON (((-106.8714 3...
## 107  3530746 +40.8715679 -102.3553579 MULTIPOLYGON (((-102.6521 4...
## 124  8166129 +38.8356456 -102.6017914 MULTIPOLYGON (((-102.5769 3...
## 163  3364150 +38.1019955 -105.3735123 MULTIPOLYGON (((-105.7969 3...
## 200 25642579 +37.2873673 -107.8397178 MULTIPOLYGON (((-108.2952 3...
## 228  2035929 +37.7810492 -107.6702567 MULTIPOLYGON (((-107.9751 3...
## 374  6472577 +39.2175407 -106.9161621 MULTIPOLYGON (((-106.9154 3...
## 397 43519329 +39.1189141 -105.7176479 MULTIPOLYGON (((-105.9751 3...
## 408  1670455 +37.5684423 -105.7880414 MULTIPOLYGON (((-106.0393 3...
## 427 15317423 +37.9581814 -102.3921613 MULTIPOLYGON (((-102.2111 3...
#return just the CRS (more detailed)
st_crs(counties)
## Coordinate Reference System:
##   User input: NAD83 
##   wkt:
## GEOGCRS["NAD83",
##     DATUM["North American Datum 1983",
##         ELLIPSOID["GRS 1980",6378137,298.257222101,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["latitude",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["longitude",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4269]]

You can check if two objects have the same CRS like this:

st_crs(counties) == st_crs(poudre_points_sf)
## [1] FALSE

Uh oh, the CRS of our points and lines doesn’t match. While tmap performs some on-the-fly transformations to map the two layers together, in order to do any analyses with these objects you’ll need to re-project one of them. You can project one object’s CRS to that of another with st_transform like this:

poudre_points_prj <- st_transform(poudre_points_sf, crs = st_crs(counties))

#Now check that they match
st_crs(poudre_points_prj) == st_crs(counties)
## [1] TRUE

8.3.4 Raster Data

Earlier in this lesson you downloaded a raster file for the elevation of Colorado. Make sure that file elevation.tif is in the data/ folder of your R Project, and read the raster file in the rast() from the terra package like this:

elevation <- rast("data/elevation.tif")

Make a quick plot to see the elevation layer:

qtm(elevation)

By default, tmap uses a categorical symbology to color the cells by elevation. You can change that to a continuous palette with tm_raster() like this:

tm_shape(elevation)+
  tm_raster(style = "cont", title = "Elevation (m)")

Let’s inspect this raster layer a little. By printing the object name to the console we see a bunch of metadata like resolution (cell size), extent, CRS, and file name.

elevation
## class       : SpatRaster 
## dimensions  : 819, 1434, 1  (nrow, ncol, nlyr)
## resolution  : 0.004895063, 0.004895063  (x, y)
## extent      : -109.0609, -102.0414, 36.99411, 41.00317  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat NAD83 (EPSG:4269) 
## source      : elevation.tif 
## name        : elevation 
## min value   :      1018 
## max value   :      4288

8.3.5 terra

We can use the terra package to work with raster data. For example, say we only want to see elevation along the Poudre highway. We can use crop to crop the raster to the extent of our roads shapefile (to get all of Larimer County), using the ext() function to get the extent of that spatial object.

# If we try this, we get an error
elevation_crop <- crop(elevation, ext(poudre_hwy))

Lets make a final map with all the spatial data we created:

qtm(elevation_crop)+
  qtm(poudre_hwy)+
  qtm(poudre_points_prj)

8.4 Reading and Writing Spatial Data

8.4.1 Writing spatial data

All of the spatial data we’ve created are only saved as objects in our environment. To save the data to disk, the sf and terra packages have functions to do so. You are not required to save these files, but if you want to follow along with these functions save the data to the data/ folder you created at the beginning of this lesson.

To save vector data with sf, use write_sf()

write_sf(poudre_hwy, "data/poudre_hwy.shp")

write_sf(poudre_points_prj, "data/poudre_points.shp")

While you can give the file any name you want, note that you must put ‘.shp’ as the extension of the file.

After saving the above files, check your data/ folder and notice the other auxiliary files saved with it (i.e., not just .shp). It is VERY important that whenever you share shapefiles, all the auxiliary files are saved with it, so often shapefiles are transferred via .zip folders. However, when reading shapefiles into R (see below) you only specify the file with the ‘.shp’ extension. As long as all the other auxiliary files are saved in that same folder, it will read in the shapefile correctly.

To save raster data with terra use writeRaster()

writeRaster(elevation_crop, "data/elevation_larimer.tif")

Same as with the vector data, when saving raster data you must add the ‘.tif’ file extension to the name. There are various formats raster data can be stored as (e.g., ASCII, ESRI Grid) but GeoTiffs are the most common and generally easiest to deal with in R.

8.4.2 Reading Spatial Data

To read in shapefiles, you use read_sf() . Note that this line of code will only run if you’ve saved your poudre_hwy object above with write_sf.

poudre_hwy <- read_sf("data/poudre_hwy.shp")

8.5 Before Rendering!

Before rendering this assignment to a Word document, remember Word does not work with any interactive visualizations. Therefore, go back to the beginning of your workflow where you set tmap_mode(), and instead change it to static plotting with the following line of code:

tmap_mode("plot")

Alternatively, you can set any code chunk with qtm() in it (if you are still setting tmap_mode("view), to eval = FALSE.

8.6 Exercises

Answer the following questions in you R Markdown document (also paste the questions above each answer). When your R Markdown document is complete, render it to a Word document and upload that to Canvas.

1. Filter out the counties data set to only include Larimer, Denver, and Pueblo counties.

2. Lets look at the attributes for each county in our counties dataset:

names(counties)
##  [1] "STATEFP"  "COUNTYFP" "COUNTYNS" "GEOID"    "NAME"     "NAMELSAD" "LSAD"     "CLASSFP"  "MTFCC"    "CSAFP"    "CBSAFP"  
## [12] "METDIVFP" "FUNCSTAT" "ALAND"    "AWATER"   "INTPTLAT" "INTPTLON" "geometry"

We have one variable in here AWATER that is the total area of surface water each county has. Say you want to spatially visualize the variation in surface water area among counties. Looking at the arguments you can use in qtm():

?qtm

There is an argument fill =, where we can specify a variable in the dataset to color the polygons by. Use qtm() to make a map of counties that is colored by AWATER. Which county has the largest area of surface water?

3. Write two lines of code to retrieve the CRS from 1) counties and 2) elevation.

4. extract() is a function in the terra package to extract raster values at specified spatial features (e.g., points, summarize within polygons). Run the following line of code, which add a new column to poudre_points_prj called elevation that is the extracted elevation at each site.

poudre_points_prj$elevation <-  extract(elevation, poudre_points_prj)[,2]

Make a barplot that compares the elevation at each of the 3 sites. Hint : use geom_col() with ggplot(). Which site has the highest elevation?